172 research outputs found
A first step towards computing all hybridization networks for two rooted binary phylogenetic trees
Recently, considerable effort has been put into developing fast algorithms to
reconstruct a rooted phylogenetic network that explains two rooted phylogenetic
trees and has a minimum number of hybridization vertices. With the standard
approach to tackle this problem being combinatorial, the reconstructed network
is rarely unique. From a biological point of view, it is therefore of
importance to not only compute one network, but all possible networks. In this
paper, we make a first step towards approaching this goal by presenting the
first algorithm---called allMAAFs---that calculates all
maximum-acyclic-agreement forests for two rooted binary phylogenetic trees on
the same set of taxa.Comment: 21 pages, 5 figure
Kernelizations for the hybridization number problem on multiple nonbinary trees
Given a finite set , a collection of rooted phylogenetic
trees on and an integer , the Hybridization Number problem asks if there
exists a phylogenetic network on that displays all trees from
and has reticulation number at most . We show two kernelization algorithms
for Hybridization Number, with kernel sizes and
respectively, with the number of input trees and their maximum
outdegree. Experiments on simulated data demonstrate the practical relevance of
these kernelization algorithms. In addition, we present an -time
algorithm, with and some computable function of
Exact reconciliation of undated trees
Reconciliation methods aim at recovering macro evolutionary events and at
localizing them in the species history, by observing discrepancies between gene
family trees and species trees. In this article we introduce an Integer Linear
Programming (ILP) approach for the NP-hard problem of computing a most
parsimonious time-consistent reconciliation of a gene tree with a species tree
when dating information on speciations is not available. The ILP formulation,
which builds upon the DTL model, returns a most parsimonious reconciliation
ranging over all possible datings of the nodes of the species tree. By studying
its performance on plausible simulated data we conclude that the ILP approach
is significantly faster than a brute force search through the space of all
possible species tree datings. Although the ILP formulation is currently
limited to small trees, we believe that it is an important proof-of-concept
which opens the door to the possibility of developing an exact, parsimony based
approach to dating species trees. The software (ILPEACE) is freely available
for download
On Computing the Maximum Parsimony Score of a Phylogenetic Network
Phylogenetic networks are used to display the relationship of different
species whose evolution is not treelike, which is the case, for instance, in
the presence of hybridization events or horizontal gene transfers. Tree
inference methods such as Maximum Parsimony need to be modified in order to be
applicable to networks. In this paper, we discuss two different definitions of
Maximum Parsimony on networks, "hardwired" and "softwired", and examine the
complexity of computing them given a network topology and a character. By
exploiting a link with the problem Multicut, we show that computing the
hardwired parsimony score for 2-state characters is polynomial-time solvable,
while for characters with more states this problem becomes NP-hard but is still
approximable and fixed parameter tractable in the parsimony score. On the other
hand we show that, for the softwired definition, obtaining even weak
approximation guarantees is already difficult for binary characters and
restricted network topologies, and fixed-parameter tractable algorithms in the
parsimony score are unlikely. On the positive side we show that computing the
softwired parsimony score is fixed-parameter tractable in the level of the
network, a natural parameter describing how tangled reticulate activity is in
the network. Finally, we show that both the hardwired and softwired parsimony
score can be computed efficiently using Integer Linear Programming. The
software has been made freely available
Treewidth-Based Algorithms for the Small Parsimony Problem on Networks
Phylogenetic reconstruction is one of the paramount challenges of contemporary bioinformatics. A subtask of existing tree reconstruction algorithms is modeled by the Small Parsimony problem: given a tree T and an assignment of character-states to its leaves, assign states to the internal nodes of T such as to minimize the parsimony score, that is, the number of edges of T connecting nodes with different states. While this problem is polynomial-time solvable on trees, the matter is more complicated if T contains reticulate events such as hybridizations or recombinations, i.e. when T is a network. Indeed, three different versions of the parsimony score on networks have been proposed and each of them is NP-hard to decide. Existing parameterized algorithms focus on combining the number of possible character-states with the number of reticulate events (per biconnected component). Here, we consider the treewidth of the undirected graph underlying the input network as parameter, presenting dynamic programming algorithms for (slight generalizations of) all three versions of the parsimony problem on networks. Our algorithms use a formulation of the treewidth that may facilitate formalizing treewidth-based dynamic programming algorithms on phylogenetic networks for other problems
A practical approximation algorithm for solving massive instances of hybridization number for binary and nonbinary trees
Reticulate events play an important role in determining evolutionary
relationships. The problem of computing the minimum number of such events to
explain discordance between two phylogenetic trees is a hard computational
problem. Even for binary trees, exact solvers struggle to solve instances with
reticulation number larger than 40-50. Here we present CycleKiller and
NonbinaryCycleKiller, the first methods to produce solutions verifiably close
to optimality for instances with hundreds or even thousands of reticulations.
Using simulations, we demonstrate that these algorithms run quickly for large
and difficult instances, producing solutions that are very close to optimality.
As a spin-off from our simulations we also present TerminusEst, which is the
fastest exact method currently available that can handle nonbinary trees: this
is used to measure the accuracy of the NonbinaryCycleKiller algorithm. All
three methods are based on extensions of previous theoretical work and are
publicly available. We also apply our methods to real data
Do branch lengths help to locate a tree in a phylogenetic network?
Phylogenetic networks are increasingly used in evolutionary biology to
represent the history of species that have undergone reticulate events such as
horizontal gene transfer, hybrid speciation and recombination. One of the most
fundamental questions that arise in this context is whether the evolution of a
gene with one copy in all species can be explained by a given network. In
mathematical terms, this is often translated in the following way: is a given
phylogenetic tree contained in a given phylogenetic network? Recently this tree
containment problem has been widely investigated from a computational
perspective, but most studies have only focused on the topology of the phylo-
genies, ignoring a piece of information that, in the case of phylogenetic
trees, is routinely inferred by evolutionary analyses: branch lengths. These
measure the amount of change (e.g., nucleotide substitutions) that has occurred
along each branch of the phylogeny. Here, we study a number of versions of the
tree containment problem that explicitly account for branch lengths. We show
that, although length information has the potential to locate more precisely a
tree within a network, the problem is computationally hard in its most general
form. On a positive note, for a number of special cases of biological
relevance, we provide algorithms that solve this problem efficiently. This
includes the case of networks of limited complexity, for which it is possible
to recover, among the trees contained by the network with the same topology as
the input tree, the closest one in terms of branch lengths
- …